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ABSTRACT 

CNG repeats (where N denotes one of the four natu- 
ral nucleotides) are abundant in the human genome. 
Their tendency to undergo expansion can lead to 
hereditary diseases known as TREDs (trinucleotide 
repeat expansion disorders). The toxic factor can be 
protein, if the abnormal gene is expressed, or the 
gene transcript, or both. The gene transcripts have 
attracted much attention in the biomedical commu- 
nity, but their molecular structures have only recently 
been investigated. Model RNA molecules compris- 
ing CNG repeats fold into long hairpins whose stems 
generally conform to an A-type helix, in which the 
non-canonical N-N pairs are flanked by C-G and G- 
C pairs. Each homobasic pair is accommodated in 
the helical context in a unique manner, with con- 
sequences for the local helical parameters, solvent 
structure, electrostatic potential and potential to in- 
teract with ligands. The detailed three-dimensional 
profiles of RNA CNG repeats can be used in screen- 
ing of compound libraries for potential therapeutics 
and in structure-based drug design. Here is a brief 
survey of the CNG structures published to date. 

INTRODUCTION 

Trinucleotide repeats (TNRs) are a class of microsatellite 
sequences abundant in the intergenic as well as genetic re- 
gions, including open reading frames. More than 30 000 
TNRs (six repeated units or more) have been found in the 
human genome (1). Similar to other microsatellites, TNR 
sequences exhibit high variability in length between indi- 
viduals. The mutability of length and its overrepresenta- 
tion in genes suggest that these can be regulatory elements 
or a source of evolutionary change, perhaps fine-tuning 
gene expressions (2). These TNR features can be functional 
in a normal organism, but they can become deleterious 
when abnormal lengthening of repeat units occurs. This 
is observed in humans with neurological diseases known 
as TREDs (trinucleotide repeat expansion disorders). One 
major subset of pathogenic TNRs is the CNG repeats, 
where N represents one of the four natural nucleotides. 
CNG repeats are associated with at least 15 diseases, such 



as myotonic dystrophy (DM), Huntington's disease, several 
spinocerebellar ataxias (SCA) and fragile X mental retarda- 
tion syndrome (FXS) (3). 

The current models of the pathomechanism of TREDs 
postulate two toxic agents: RNA and protein (4); however, 
recent reports also speculate about DNA toxicity (see re- 
views (2,5)). The RNA-mediated mechanism is based on 
the observation that long CNG repeats are transcribed and 
included in mRNA. There, they form hairpin structures 
that exhibit a gain-of-function abnormality by sequester- 
ing and accumulating regulatory proteins. This upsets the 
fine balance of cellular processes (6-8) and results in re- 
tention of the RNA and proteins in the nucleus, where 
they form nuclear foci (see review (9)). The second path- 
omechanism mostly concerns the toxicity of protein. The 
mutated protein contains elongated polyglutamine (polyQ) 
tracts, which result in the misfolding and aggregation of 
the protein (10). Most polyQ-containing proteins are in- 
volved in DNA-dependent regulation of transcription or 
neurogenesis. Moreover, they participate in multiple inter- 
molecular contacts. Similar to the RNA-mediated mech- 
anism, the polyQ-expanded region induces pathogenic in- 
teractions with proteins, leading to the formation of toxic 
mono- and oligomers. Subsequently, amyloid-like inclu- 
sions are formed, sequestering all engaged proteins (9). 

In recent reviews, the sharp division of pathomecha- 
nisms started to blur and became more complex. In addi- 
tion to the main mechanism of the development of some 
TREDs, another parallel toxic path has been suggested to 
coexist (see reviews (11-13)). This path is associated with 
bidirectional transcription, which normally results in an- 
tisense RNA involved in the regulation of gene expres- 
sion. In cells affected with TREDs, the antisense transcripts 
can contain an extended run of CNG repeats complemen- 
tary to the sense strand. Subsequently, the antisense RNA 
can undergo repeat-associated non-ATG (RAN) transla- 
tion that occurs independently of an ATG initiation codon 
(14-16). The small expanded peptides can exhibit toxicity, 
similar to polyQ diseases. Another possibility is that anti- 
sense RNA can hybridize to sense RNA and form double- 
stranded structures, which can be processed into siRNA, 
derived from triplet repeats, activating the silencing mech- 
anism (17,18). We cannot exclude that such RNAi is also 
generated from the hairpins formed by sense CNG tran- 
scripts (19-21). Bidirectional transcription is also associ- 
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ated with a path of DNA toxicity, and the convergent tran- 
scription from both strands triggers cell death (22). The 
double-bubble, which is formed at the expanded repeats by 
the colHsion of the sense and antisense transcription, is nec- 
essary for the induction of apoptosis (2). 

To summarize, in recent years the pathomechanism of 
TREDs has been intensely investigated. A range of molec- 
ular biology techniques was used in in vitro and ex vivo 
systems and in in vivo models. Additionally, crystallogra- 
phy made a contribution to the study of TREDs. A num- 
ber of structures have been reported in the last 10 years, 
mostly concerning RNA-mediated pathogenesis, and these 
findings have not yet been summarized. In this review, we 
present crystallographic models of toxic RNA and give an 
overall view of the structural features of the CNG repeats. 
We begin with a short description of the diseases associated 
with each type of repeat, followed by an introduction into 
the structural studies, including a short characterisation of 
the secondary structure of the DNA and/or RNA. Next, 
all crystallographic reports of RN A-mediated molecules are 
presented and discussed, which is followed by a description 
of the NMR and thermodynamic studies. 

TOXICITY OF EXPANDED CNG REPEATS 

Abnormally expanded CUG repeats are best known for the 
multiple system dysfunctions that they cause in myotonic 
dystrophy type 1 (DM1) (23). The mutation leading to DM1 
is the expansion of a CTG repeat in the 3'-untranslated re- 
gion (3'-UTR) of the dystrophia myotonica protein kinase 
(DMPK) gene. The normal length of 5-37 CTG repeats is 
expanded in DM1 to 50-3000 repeats (24) and entails a mis- 
regulation of alternative splicing of several developmentally 
regulated transcripts (25,26). This misregulation is caused 
by altered interactions of the transcripts with two antago- 
nistic splicing regulators: the CUG repeat binding protein 
(CUG-BP) (27) and the muscleblind-like (MBNLl) protein 
(28). The level of MBNLl decreases as it is sequestered to 
nuclear foci (29,30), while the level of CUG-BP increases 
(31). 

Elongated CAG repeats are best known to cause the poly- 
Q diseases, in which toxicity is attributed to malformed pro- 
teins derived from the affected genes (32,33). However, re- 
cent studies indicate that mutant transcripts can also con- 
tribute to pathogenesis (34,35). CAG repeats in transcripts 
are similar to CUG repeats in their interaction with the 
splicing regulator MBNLl (34,36), whose sequestration by 
CUG tracts causes splicing aberrations leading to DM1 
(28,37) and spinocerebellar ataxia type 8 (SCA8) (38). CAG 
repeats trigger neurodegeneration when introduced into a 
Drosophila SCA3 model (39). In all, expanded CAG repeats 
are the cause of nine human neurodegenerative disorders, 
including Huntington's disease and several spinocerebellar 
ataxias (40). 

Expansion of CGG repeats in the 5' -untranslated region 
(5'-UTR) of the fragile X mental retardation gene (FMRl) 
is associated with several phenotypes of increasingly severe 
pathology, depending on the extent of elongation (41). The 
normal range found in the population is 5-54 CGG repeats 
(41^3), where the upper region of 45-54, defined as the 
'grey zone', carries an increased likelihood of pathogenic ex- 



pansion in descendants (41,44). Tracts of 55-200 CGGs are 
premutations and can cause a progressive RNA-mediated 
neurodegenerative disorder known as fragile X-associated 
tremor ataxia syndrome (FXTAS) (45,46) in elderly males, 
while females carrying the permutation are at risk of devel- 
oping premature ovarian failure (47). More than 200 CGG 
repeats are full mutations resulting in fragile X syndrome 
(FXS), the most common inherited mental retardation syn- 
drome in man (48). 

CCG repeats are highly overexpressed in exons of the hu- 
man genome and are typically located in the 5'-UTR or in 
the translated regions (1). Their role in pathogenesis is rel- 
atively obscure compared to the other CNG repeats, but 
they are found to be associated with three tri-nucleotide 
disorders: Huntington's disease (49), myotonic dystrophy 
type 1 (50) and chromosome X-linked mental retardation 
(FRAXE)(51). 

SECONDARY STRUCTURE OF CNG REPEATS 

The secondary structure of RNA containing CNG repeats 
has been investigated extensively using in vitro models (for 
more detail, see reviews (52,53)). The models studied in- 
cluded RNA comprising not only CNG repeats but also 
transcripts of native mRNA. Digestion by nuclease SI and 
ribonucleases Tl, T2 and VI as well as lead cleavage of 
oligomers comprising 17 CNG repeats indicated that all 
CNG repeats formed hairpin structures and that these hair- 
pins showed several alternative alignments co-existing un- 
der non-denaturing polyacrylamide gel electrophoresis con- 
ditions (54). Adding terminal C-G 'clamps' reduced this 
micro-heterogeneity. CNG hairpins clamped by six C-G 
pairs showed only one alignment. The hairpins, compris- 
ing an odd number of CNG repeats, formed apical loops of 
four nucleotide (nt) residues, as indicated by susceptibility 
to nuclease digestion. In the case of odd-numbered repeats, 
CAG and CCG showed 7-nt loops, while CUG and CGG 
formed tighter 3-nt loops. 

Further biophysical and biochemical studies to ascertain 
the structural diversity of a wider range of RNA triplet re- 
peats categorized all CNG repeats as 'fairly stable hairpins', 
of which CGG was the most stable and CCG the least sta- 
ble (55). Addressing the question of the possibiHty of CGG 
repeats forming quadruplexes, the authors found no evi- 
dence for their formation and concluded that the properties 
of CGG repeats did not essentially deviate from those of 
CUG, CCG and CUG repeats. 

A more focussed study was carried out on CAG repeats 
in transcripts related to human diseases: the spinocerebel- 
lar ataxia types 3 and 6 and dentatorubral-pallidoluysian 
atrophy. The cleavage patterns of the transcripts, obtained 
by digestion with lead and a variety of ribonucleases, indi- 
cated the formation of several variants of a slipped hairpin 

(56) . The study also demonstrated that a single -nucleotide 
polymorphism found near the CAG repeat modulated the 
structure. A related paper addressed the effect of naturally 
occurring 'interruptions' in an expanded CAG repeat in 
the coding sequence of the spinocerebellar ataxia type 1 

(57) . The cleavage patterns indicated that the interruptions 
destabilized the hairpin structure, causing specific bulging 
and branching. This was proposed to mitigate the onset 



of pathogenesis. A similar investigation of CAG-iepeat- 
containing transcripts related to spinocerebellar ataxia type 
2 revealed the structure-modulating role of naturally occur- 
ring CAA interruptions (58). Single-nucleotide polymor- 
phism was also observed in the CGG repeats in FMRl gene 
transcripts. Single AGO interruptions changed the fold- 
ing of the 5'-UTR fragments, resulting in branched hairpin 
structures (43). 

DNA containing an extended number of CNG repeats 
forms non-B DNA structures (59), which can occur during 
replication, translation, recombination and repair. During 
those processes, the two parental strands are separated and 
single-stranded DNA structures can be formed. The sec- 
ondary structure of isolated dCNG repeats has been stud- 
ied by a variety of methods, such as electrophoretic mo- 
bility assay, UV absorbance and chemical or enzymatic di- 
gestion (see review (60). Similar to RNA, all four types of 
isolated repeats form hairpin structures. The G-rich dCGG 
repeats also have the potential to form tetraplexes. In the 
presence of K"^ ions, the d(CGG)2o oligomer exhibited in- 
creased electrophoretic mobility, suggesting that it formed 
an intramolecular tetraplex (a hairpin folded in half). This 
result was confirmed by CD, NMR and UV spectroscopy 
(59). 

To date, there is only one direct line of evidence showing 
that the DNA hairpin structures are formed in vivo. Two 
distinct zinc finger nucleases (ZNF) were engineered to rec- 
ognize specifically the stem of a hairpin formed by CAG 
or CTG repeats (61). The nucleases were expressed in cells 
containing an extended number of repeats. As a result, a 
contraction of the repeated sequence was observed, indicat- 
ing the presence of hairpins. Moreover, nuclease activity was 
detected only in an active replication state and only in cells 
harbouring 45 or 102 repeated units. 

X-RAY CRYSTALLOGRAPHIC STUDIES 

Crystal structures have been published of RNA oligomers 
containing all four types of CNG repeats. 

CUG repeats 

The earliest crystallographic study of a trinucleotide repeat 
targeted the structure of an oligomer comprising six CUG 
repeats (PDB code Izev) (62). The authors found that the 
RNA is double -helical, having overall characteristics of the 
A-form, in which the C-G and G-C base pairs have U-U 
'mismatches' in between (Figure lA). The non-canonical 
pairing seemed not to distort the backbone from the A- 
helix. Closer observations of the inter-strand interactions, 
in particular the details of the non-canonical base-pairing 
and the solvent structure, were prevented by an apparent 
superposition of molecules in the crystal lattice. A differ- 
ent study revealed duplexes of G(CUG)2C at a resolution 
of 1.23 A (Figure IB, PDB code 3glp) (63). This RNA 
was also in the A-form with the helical twist in the typ- 
ical range of 32-34°. The crystal lattice comprised three 
duplexes, crystallographically independent but with simi- 
lar structures, stacked end-to-end to form pseudo-infinite 
helices. This arrangement is frequently found in crystals of 
nucleic acids. The most notable features of this structure are 
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Figure 2. Non-canonical N-N pairs within duplexes formed by CNG re- 
peats: U-U (A), A-A (B), G-G (C) and C-C (D). Strand separation is indi- 
cated, measured as a distance between the CI' atoms. Surfaces especially 
liable to interact with the solvent through H-bonding are indicated with 
arcs. Representative N-N pairs with the 2Fo-Fc electron density, contoured 
at 1(T level, are shown (right) based on PDB entries 3glp, 3nj6, 3rlc, 4e59, 
respectively. 



the clearly resolved interactions within the duplex and the 
well-defined solvent structure. While the C-G and G-C pairs 
formed standard Watson-Crick pairing, the U-U pairs in- 
teracted in a unique way. The chemical symmetry of the U- 
U pair is broken in the three-dimensional structure, which 
has one of the uridines inclined toward the minor groove 
to make a single hydrogen bond between its carbonyl 04 
atom and the N3 amino group of the opposite base (Figure 
2A). This interaction appeared 'stretched' compared to the 
consensus structure derived from previously observed U-U 
pairs (64) that had two hydrogen bonds. This stretching is 
clear when the CT-Cr distance between the two opposite 
uridine residues is considered. It amounts to 10.4 A in the 
context of the CUG repeat, compared to the general con- 
sensus of 8.6 A. A search using FRABASE (65) and FR3D 
(66) showed stretched U-U pairs in only two other struc- 
tures: tRNA-Gln, in which the U-U pair closes the anti- 
codon loop, and the A site of 16S rRNA, in which the helix 




Figure 3. Duplexes containing CNG pairs. Local unwinding and subse- 
quent widening of the major groove can be seen in the vicinity of A-A and 
G-G pairs. 



is flanked by a G-C pair at the 5' side and a C-G pair at the 
3' side. 

What keeps the uridines apart and prevents them from re- 
alising their full base-pairing potential? It is likely that this 
non-canonical pairing is stabilized by the strong flanking 
C-G and G-C pairs, which maintain the duplex in a clearly 
recognisable A-form (Figure 3A). The stability of the U-U 
pairing is reinforced by ordered water molecules in the ma- 
jor and minor grooves. These waters contribute to the H- 
bonding network and can be considered a part of the struc- 
ture. Duplexes comprising CUG repeats have a character- 
istic pattern of surface electrostatic potential in their minor 
groove (Figure 4) that comprises alternating bands of pos- 
itive and negative potential along the direction of the helix 
axis. The major groove shows no regularity and is predomi- 
nantly electronegative with positive patches. The hydrogen- 
bonding potential of the duplexes is demonstrated by their 
interactions with ordered water molecules and small lig- 
ands present in the crystallisation medium (glycerol and sul- 
phate ions). These interactions could be used as a guide in 
structure-based drug design. The paper also includes a new 
analysis of the data obtained by Mooers et al. (62). After 
detwinning the diffraction intensities, the structure of the 
[(CUG)6]2 duplex (PDB code 3gm7) corresponds closely 
with the structure of the shorter duplexes. 

Given that each U-U pair has two possible conforma- 
tions, depending on which uridine is inclined, it begs the 
question: is the choice random or is there a correlation be- 
tween the conformations of adjacent pairs? The structures 
included in this study indicate that there is no correlation. 
This means that despite the chemical symmetry and the 
palindromic nature of duplexes comprising CUG repeats, 
the number of possible conformations is large and grows 
rapidly for longer runs: approximately as 2^/2 for N re- 
peats. The authors suggested this as a way of explaining how 
long runs of CUG repeats could differ from short runs and 
condense into nuclear foci, as opposed to shorter repeat se- 
quences that remain soluble. 

Another study of a crystal structure containing CUG re- 
peats reported a wider range of conformations of the U- 
U pairs, including arrangements with zero, one or two H- 
bonds (67). Two crystal forms were reported of a double- 
stranded construct containing three CUG repeats and 
5'UU dangling ends (Figure IC, D; PDB codes 3szx, 3syw). 
In both crystal forms, the central CUGs had the uracil 
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Figure 4. Electrostatic surface potential of three consecutive 
(GCUGCUGC)2 duplexes. Red is negative, blue is positive. The 
minor groove shows a striped pattern of surface potential characteristic 
of CNG repeats. Based on PDB entry 3gm7. 



rings symmetrically opposed and apparently too far apart 
to form any H-bonds. In the flanking repeats, the two struc- 
tures differed. In one of them, the first U-U pair formed a 
single 04-N3 H-bond, while the other pair made a tenuous 
3.6 A contact. The distance between the RNA strands was 
at least 10 A and the dangling ends interacted with neigh- 
bouring duplexes to form a crystal lattice of pseudo-infinite 
helices. In the other crystal form, the dangling ends were 
tucked in the major groove and the width of the groove was 
increased, while the distance between the strands decreased 
below 10 A, which allowed the interacting uridines to form 
two hydrogen bonds. Based on the structural variation in 
the crystal, the authors postulated that the U-U pairs could 
sample multiple conformations in vivo, which has implica- 
tions for the recognition by proteins and small ligands. 

A crystal structure of G(CUG)6C 20-mers forming blunt- 
ended double helices was described recently (Figure IE) 
(68). The crystal lattice contained two distinct duplexes of 
which one was located on a crystallographic 2-fold axis 
(PDB code 4e48). Thus, the structure contained nine dis- 
tinct double-helical CUG repeats. The authors took the op- 
portunity to compare the nine U-U pairs, analyse the spread 
of conformations and thus gain insight into the dynamics of 
such pairing. They found that the 'stretched U-U wobble' 
with a single H-bond and associated semi-conserved wa- 
ter structure was the predominant mode of U-U pairing (7 
instances), but they also found interesting deviations. One 
U-U pair exhibited a clearly symmetric pairing mode, de- 
scribed as 'symmetric H-nonbonded U-U pairing', which 
the authors proposed as an intermediate conformation be- 
tween two predominant asymmetric pairings. The different 
modes of interactions displayed by the U-U pairs could re- 
sult in the adaptability of CUG repeats in the protein-bound 
state. The authors concluded that 'generally speaking, the 
U-U pairs just follow the requirements of the system'. 

The most recent report regarding the CUG series presents 
the crystal structure of an RNA construct containing a tan- 
dem GAAA tetraloop and its receptor that form the tip and 
the upper part of a hairpin, respectively, while the lower 
part of the stem is a double -helical segment of two CUG 
repeats (Figure IF, PDB code 4fnj) (69). The combination 
of the tetraloop with the receptor faciHtated crystalhsation, 
which enabled the examination of yet another instance of 
two consecutive CUG repeats. They formed a helix whose 
geometry also corresponded to the A-form. The C-G and 
G-C pairs were standard Watson-Crick pairs, while the two 
U-U pairs displayed two of the previously observed con- 
formations: one with one hydrogen bond and one with two. 
The authors catalogued all U-U conformations observed to 
date and observed that the most common U-U pairing in 
the context of CUG repeats was the one with one H-bond 
and a strand separation of at least 10 A, consistent with the 
geometry of the A-form. The authors concluded that 'U-U 
pairs are dynamic'. This is an understandable statement, but 
it requires some caution. One could postulate that the mul- 
tiple conformations are 'snapshots' of a dynamic system, 
but in crystallography, we do not actually observe the tran- 
sitions between the different instances and therefore cannot 
be clear about their nature — in particular, their frequency 
and the conditions upon which they depend. 
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CAG repeats 

Two crystal forms of (GGCAGCAGCC)2 have been re- 
ported (70). One of the structures was solved at atomic reso- 
lution (0.95 A) and contained a duplex with the two strands 
related by crystallographic symmetry (Figure IG, PDB 
code 3nj6). The other structure was analysed at medium 
resolution (1.9 A) and contained three crystallographically 
independent duplexes (PDB code 3nj7). All duplexes were 
closely superposable and possessed general characteristics 
of the A-form. The A-A pairs, embedded between the 
canonical C-G and G-C pairs, had both residues in the 
anti conformation (Figure 2B). Clashing between the large 
purine rings was avoided by shifting of the residues towards 
the major groove. One was shifted more than the other 
and this was described as the 'thumbs-up' conformation. 
The mutual positioning of the adenines allowed one weak 
C2H2---N1 hydrogen bond. To our knowledge, such pair- 
ing between adenosine residues had not been described. The 
non-canonical character of the interaction resulted in local 
distortions of the helix geometry - in particular, in the flip- 
ping of the 05' atom of the backbone of the neighbouring 
guanosine, due to a rotation of the C5'-05' bond. This re- 
sulted in a local unwinding of the helix. Although accom- 
modation of the bulky adenine rings within the helical con- 
text seems to be 'sterically demanding' , the inter-strand CI '- 
Cr distances (11 A) for the adenosines were only slightly 
larger than average for the A-form. The unsaturated Pi- 
bonding potential of the A-A pair (both exo-amino groups 
and one Nl atom) was externalized in the major groove 
where it attracted a sulphate anion from the crystallisation 
medium. Patches of positive potential on the predominantly 
negative interior of the major groove were clearly visible on 
the surface electrostatic potential. The minor groove dis- 
played a similar banded pattern of alternating positive and 
negative potential, similar to that seen in CUG repeats. 

In another study, three consecutive CAG repeats within 
flanking sequences were described (Figure IH) (71). The 
middle A-A pair had both adenosines in the anti confor- 
mation and the two adenine rings were modelled as being 
symmetrically opposed with the Nl atoms and the C2H2 
groups in unlikely close contact. Examination of the elec- 
tron density (see Appendix), calculated on the basis of struc- 
ture factors deposited by the authors in the PDB (accession 
code 4j50), indicates disorder, which could be modelled as 
two alternative conformations of the A-A pair. In each of 
the conformations, one adenine would be at more of an in- 
cline than the other, so that C2H2 would be opposite Nl 
and a hydrogen bond could form between them, similar to 
the structures reported before (70). Uninterpreted electron 
density in the major groove near the A-A pair could indicate 
a sulphate ion interacting with the e.xo-amino groups, like in 
the previous paper. The flanking A-A pairs were described 
as syn-anti. The adenine in the anti conformation has clear 
electron density, while the residue in the syn conformation 
appears to be rather disordered. There are significant resid- 
ual peaks in the difference electron density map associated 
with the adenine ring, and the sugar-phosphate backbone 
is weak in this region. It is not clear how this should be 
modelled. The adenosine in the syn conformation seems to 



interact with the overhanging disordered uridine from the 
opposite strand that is tucked in the major groove. 

CGG repeats 

Three crystal structures of short CGG-containing 
oligomers have been pubHshed (Figure II-K) (72). A 
duplex of G(CGG)2C crystallized with a remarkable 
18 distinct duplexes in the unit cell, all arranged in the 
typical end-to-end manner, in semi-infinite columns (PDB 
code 3rlc). The other two crystal structures contained 
guanosine residues brominated at position 8 (PDB codes 
3rld and 3rle). All helices had the A-form, with some local 
deviations. In all G-G pairs, one guanosine was syn and 
the other was anti, with two hydrogen bonds between the 
Watson-Crick edge of G{anti) and the Hoogsteen edge of 
G(syn)\ 06--N1H and N7-N2H (Figure 2C). This type of 
interaction is common for G-G pairs and has been observed 
in many NMR and crystallographic structures. The G{syn) 
residues also showed unusual a and y backbone torsion 
angles, which resulted in a local unwinding of the helix, 
which seemed to be compensated for elsewhere (Figure 3C). 
The G(syn)-G{anti) pairs had a characteristic hydration 
pattern and, in addition, attracted charged species from the 
solvent, especially to the exposed Watson-Crick edges in the 
major groove. Sulphate anions and hydrated calcium ions, 
present in the crystallisation medium, interacted with the 
paired residues. The syn-anti arrangement avoids a steric 
clash of the two bulky guanines within the helical structure, 
and the Cl'-Cl' distance between them (11.3 A on average) 
is only slightly longer than that for the canonical C-G 
and G-C flanking pairs. The brominated guanosines were 
always in the syn conformation, which increased the order 
of the G-G pair by restricting its conformational freedom. 
The electrostatic potential surface displayed the already 
familiar banded pattern of positive and negative character 
in the minor groove. This is due primarily to the C-G and 
G-C pairs. The potential in the major groove was irregular 
and matched the observed affinity for small charged lig- 
ands. The observation of duplexes for all CGG-containing 
structures was somewhat surprising to some researchers, 
who expected quadruplexes by analogy with DNA. 

Another paper described an RNA duplex containing 
three consecutive CGG repeats with flanking sequences and 
5'-UU overhangs (Figure IL) (73). Overall, the structure 
of the CGG repeats is similar to that described above (72), 
with G(syn)-G{anti) pairs and unusual torsion angles of the 
G{syn) residues, resulting in a local unwinding of the he- 
lix (PDB code 3js2). The authors noted that the resultant 
widening of the major groove and the base-pair inclination 
near the G-G pairs resembled the A'-form of RNA. They 
also noted an interesting difference from the structures of 
other triplet repeats: the lack of ordered ions near the G-G 
pairs, even though several cations and sulphate anions were 
present in the crystallisation medium. 

CGG repeats 

At present, there is one paper reporting the crystallographic 
structure of CCG repeats (74). It describes two oligomers: 
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(GCCGCCGC)2 and (GCCGLCCGQz, in which G^ be- 
longs to the 'locked' (LNA) series (Figure IM, N). The 
oligomers formed duplexes in the crystal lattice, and again, 
the RNA has the A-form, but pairing of the strands was un- 
expected. In the unmodified oligomer, the strands slipped 
in the 5' direction (PDB code 4e59), whereas in the LNA- 
containing oligomer, there was a slippage in the 3' direction 
(PDB code 4e58). In both cases, the result was to reduce 
the number of C-C pairs from the expected two, if no slip- 
page had occurred, to one. Nevertheless, three instances of 
double-stranded CCG triplets were observed: one for un- 
modified RNA and two in crystallographically independent 
LNA-containing duplexes. Each of the three observed C-C 
pairs interacts differently, forming either one weak H-bond 
or none (Figure 2D). LNA has no apparent effect on heli- 
cal parameters, but base stacking is increased compared to 
the native duplex. It seems that C-C pairs contribute little 
to the stability of the duplex, which is why the system acts 
to eliminate them by strand slippage. These results are in 
agreement with the measured thermodynamic fragility of 
CCG repeats. C-C pairs within other known helical RNA 
structures are relatively rare, but if present, they also show 
conformational variability. One of the cytosine residues is 
shifted to various extents towards the minor groove and dif- 
ferent H-bonds are observed between the paired cytosines. 
The apparent weakness of the C-C interactions also sheds 
light on the observation that the MBNLl protein, thought 
to interact with single strands of RNA, recognizes CCG 
runs as well as CUG and CAG but not the relatively ro- 
bustly paired CGG repeats. 

NMR STUDIES 

There are relatively few published works describing NMR 
studies of RNA TNRs. 

The earliest paper presents an analysis of a 97-long run of 
CUG repeats in solid state (75). The authors noted the pres- 
ence of canonical C-G pairs and observed resonances con- 
sistent with an A-form helix with a C3'-endo sugar pucker 
and an anti conformation of the glycosidic torsion angle. 
Recent solution NMR work on a single CUG with flank- 
ing sequences, to stabilize the duplex form, was also consis- 
tent with an A-form geometry with a 3C' -endo sugar pucker 
(PDB codes 218c, 218u and 218w) (76). The Hne broadening 
and temperature profile of the spectrum indicated a struc- 
turally dynamic U-U pair. When the NMR model was sub- 
jected to molecular dynamics simulation, the U-U pair was 
found to adopt conformations with zero, one or two hydro- 
gen bonds, of which the most stable was the structure with 
one H-bond. These results are essentially in agreement with 
the crystallographic studies described above. 

One paper describes a study of CGG-repeat RNA in solu- 
tion (77). One short duplex and two hairpins, each predicted 
to contain three CGG repeats, were investigated. The du- 
plex gave an ambiguous spectrum and most likely formed 
longer than predicted, overlapping duplexes. The authors 
observed no patterns characteristic of quadruplexes. The 
oligomers designed to fold into hairpins gave spectra in- 
dicating that hairpins indeed formed and that the non- 
canonical G-G pairs were located between flanking C-G 
and G-C pairs. The G-G pairs appeared dynamic with some 




Figure 5. No C-C pairs are formed in the solution NMR structure of 
(CCGCCG)2 DNA because the cytosine residues hang out or bulge out 
of the double helix. 

indication of symmetric G-imino-G-imino interactions, but 
glycosidic bond angles necessary for such an interaction 
could not be determined due to severe line broadening and 
signal overlap. The authors again stressed that there was, 
nevertheless, no evidence of tetraplex formation. 

Only one three-dimensional structure has been published 
of DNA containing CNG repeats (78). The solution struc- 
ture of d(CCGCCG)2 was solved using NMR (PDB code 
Inoq). The double helix contained only G-C and C-G pairs. 
The C-C pairs were eliminated by strand slippage, which 
resulted in dangling cytosine residues at the 5' site of each 
strand. In addition, both 4C residues bulged out causing de- 
formation of the phosphate backbone (Figure 5). The struc- 
ture shows similarity with the crystallographic models in 
that the unstable C-C pairs have been eliminated. 

THERMODYNAMIC PROPERTIES OF CNG REPEATS 
VERSUS CRYSTALLOGRAPHIC DATA 

Thermodynamic studies have shown that RNA oligomers 
containing 2-3 repeats form duplexes, while oligomers with 
4-5 repeated units exist as a mixture of duplexes and hair- 
pins. Longer RNA molecules form only hairpins (79). Du- 
plexes of G(CNG)2^C oligomers have comparable thermo- 
dynamic stability. Oligomers of CGG repeats have the low- 
est free energy (AG37 ) (approx. —10 kcal/mol) followed 
by comparable ohgomers of CUG repeats (—7.5 kcal/mol), 
then CAG CCG (—6.5 kcal/mol). Hairpin structures con- 
taining 5-7 CNG units also have comparable stability, but 
the most stable repeats are CUG, followed by CAG, then 
CGG and CCG. However, such oligomers form a mixture 
of structures and deconvolution of the melting curves is 
necessary to obtain the thermodynamic parameters of each 
type of structure. The results for longer RNA molecules 
containing 20 repeats agree with those obtained for shorter 
oligomers (55). Values of AG37> range from —2.38 to —6.68 
kcal/mol. In 100 mM NaCl, the most thermodynamically 
stable were CGG, then CAG, CUG and CCG repeats. How- 



8196 Nucleic Acids Research, 2014, Vol. 42, No. 13 



ever, measurements performed in 100 mM KCl gave differ- 
ent results. The stability series began with CAG repeats fol- 
lowed by CGG, CUG and CCG The biggest drop in ther- 
mal stability was observed for the CGG repeats. The desta- 
bilising effect on other TNR RNA was lower: in the range 
of 0.18-0.69 kcal/mol. The loss of stability in the presence 
of K"^ ions was also an indication that none of the RNA 
formed tetraplexes, which are generally promoted by the 
presence of K"^. 

Thermodynamic stability is an important property of nu- 
cleic acids and is difficult to investigate using crystallogra- 
phy, although some estimations are possible based on the 
number of H-bonds in the N-N pairs or the area of stacking 
interactions. Considering the aforementioned parameters in 
the crystallographic structures of TNR RNA, the CGG re- 
peats appear the most stable. The G-G pairs form the largest 
number of hydrogen bonds (in addition to two H-bonds be- 
tween the paired guanines, the residue in the syn confor- 
mation bonds the phosphate group of its own strand). The 
CUG repeats form the second most stable structure due 
to the one regular H-bond within the U-U pair. The A-A 
pairs within the CAG repeats interact via one weak C-H- - N 
hydrogen bond. In the case of the CCG repeats, the sys- 
tem limits the number of non-canonical pairs, which do not 
readily form H-bonds. However, the thermodynamic data 
for duplexes of (GCUGCUGC)2 and for (GCCGCCGOz 
show interesting properties. RNA containing CCG repeats 
is more stable ( AG37 = —6.09 kcal/mol) than CUG repeats 
(AG37° = —5.08 kcal/mol). For longer oligomers (3 and 4 
repeated units), the stability increases for CUG (AG37 > 
7 kcal/mol), while for CCG, it remains nearly the same. 
This effect is difficult to explain, but in the CUG repeats 
structure, this can be due to limited stacking interactions 
that increase with oligomer length. In the case of the CCG 
repeats, this suggests that strand slippage occurs as ob- 
served in the crystallographic models. One of the C-C pair 
is eliminated, giving energetic gain for the short RNA. For 
longer oligomers, the system is most likely not able to reduce 
the number of non-canonical pairs destabilising the RNA 
structure. 

Interestingly, in our experience, the ease of crystallisa- 
tion and the crystal reproducibility of the oligomer cor- 
responded to the thermodynamic properties of the par- 
ticular type of CNG repeat. The CGG oligomers crystal- 
lized rapidly, but it was difficult to obtain monocrystals. 
Most likely, the conformational flexibility of the G-G pair 
was introducing disorder that was overcome by seeding. In 
the case of CUG, good crystals appeared within several 
days. The oligomers of the CAG and CCG repeats were 
the most challenging for crystallisation. The crystal repro- 
ducibility was random and attempts to optimize crystallisa- 
tion did not give satisfactory results. It seems that serendipi- 
tous changes in the fragile equilibrium in the crystallisation 
drops enabled us to obtain single crystals. 

CONCLUSIONS 

All CNG repeats fold into hairpins in which the double- 
stranded stems have non-canonical N-N pairs that are stabi- 
lized by the sturdy C-G and G-C pairs. The A-form prevails 
albeit with some deviations characteristic of the given N -N 



pair. All CNG duplexes have been characterized in crystal- 
lographic detail in terms of their base-pairing, interactions 
with the solvent and small ligands, detailed helical param- 
eters and deviations from the canonical A-helix as well as 
their electrostatic potential and potential to form hydrogen 
bonds. Their three-dimensional structural profiles are de- 
tailed enough to serve as a basis for screening compound 
libraries for potentially useful ligands and for structure- 
based drug design. 

PERSPECTIVES 

Time has come to move on to more complex structures. To 
begin with, the picture of hairpins formed by CNG repeats 
is incomplete. Although detailed structures of their double- 
stranded stems have been solved, we still do not know the 
structure of the apical loops. Second, to understand the tox- 
icity of the expanded CNG repeats, one would need to ex- 
amine their interactions with other molecules and their role 
in altered gene splicing and the formation of nuclear foci. 
With this knowledge, one could search in a rational way 
for Hgands that would bind the CNG runs to mitigate their 
deleterious effects. 

This question is addressed in a paper reporting the crys- 
tallographic complex of zinc finger domains from the alter- 
native splicing regulator protein MBNLl and CGCUGU 
(80). The model shows the protein interacting with a short 
single-stranded RNA, in particular with the GC step in its 
sequence. Inspection of the atomic coordinates and the elec- 
tron density calculated using the structure factors deposited 
with the PDB (code 3d2s) reveals unexplained degradation 
of the RNA. From a crystallographer s perspective, it re- 
veals a puzzling, nearly perfect match between pairs of pro- 
tein molecules and the associated RNA chains, which are 
related by a translation of half a unit cell along the crys- 
tallographic a-axis. Further studies are clearly necessary to 
elucidate the interactions of CNG repeats with their protein 
partners. 

Another clear aim for further studies is a detailed char- 
acterisation of complexes between CNG repeats and com- 
pounds that could bind them specifically and, hopefully, re- 
verse the precipitation of nuclear foci or prevent their for- 
mation. Several classes of compounds have been proposed 
as potential therapeutics (81-84). Examining their interac- 
tions with CNG runs would help verify their utility and en- 
able their refinement. 

APPENDIX 

Examining the atomic coordinates and the correspond- 
ing electron density is a useful addition to reading crys- 
tallographic papers, and it can be accomplished easily, 
even by non-crystallographers. Freely available applica- 
tions such as Coot (www2.mrc-lmb.cam.ac.uk/Personal/ 
pemsley/coot) automatically download atomic models and 
calculate electron density maps; one needs only to enter the 
PDB code to view this information. 
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